Word Alignment and the Expectation-Maximization Algorithm
نویسنده
چکیده
The purpose of this tutorial is to give you an example of how to take a simple discrete probabilistic model and derive the expectation maximization updates for it and then turn them into code. We give some examples and identify the key ideas that make the algorithms work. These are meant to be as intuitive as possible, but for those curious about the underlying mathematics, we also provide some derivations, and point the reader to additional tutorials when mathematical depth goes beyond the scope of the tutorial. But you needn’t follow these derivations in detail to understand the main concepts, so whether you try to is to is up to you. We’ll consider IBM Model 1, which has been the subject of other similar tutorials [Collins(2011), Knight(1999)]. A great deal of the material here is inspired, either directly or indirectly, by these papers, as well as the concise original description in Sections 4 and 4.1 of [Brown et al.(1993)], and of course, Chapter 4 of Philipp Koehn’s textbook (2010) . Everyone learns differently, though, so I urge you to take a look at one of those if you’re still confused after reading this one. The basic plan is to learn a conditional probabilistic model of a French sentence f given an English sentence e, which we’ll denote pθ(f|e). Subscript θ refers to the set of parameters in the model; we’ll describe what it looks like shortly. We also have a dataset D of N sentence pairs that are known to be translations of each other, D = {(f(1), e(1))...(f(N), e(N))}, where each superscript (n) indexes a different sentence pair. The object of our model will be to uncover the hidden word-to-word correspondences in these translation pairs. We will learn the model from this data, and once we’ve learned it, we’ll use it to predict the existence of the missing word alignments.
منابع مشابه
Semi-Supervised Training for Statistical Word Alignment
We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine tra...
متن کاملThe Key Approach to Translation: Word Alignment Models
This paper focuses on a key aspect of Statistical Machine Translation: word alignment. Various word alignment models are presented, first differentiating between methods and then highlighting the preferred method. A partially detailed mathematical explanation is provided for each model as well as a brief implementation of the Expectation Maximization Algorithm (EM Algorithm) for later models. F...
متن کاملISI's Participation in the Romanian-English Alignment Task
We discuss results on the shared task of Romanian-English word alignment. The baseline technique is that of symmetrizing two word alignments automatically generated using IBM Model 4. A simple vocabulary reduction technique results in an improvement in performance. We also report on a new alignment model and a new training algorithm based on alternating maximization of likelihood with minimizat...
متن کاملWord Alignment via Submodular Maximization over Matroids
We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansa...
متن کاملStatistical Approach With Factored Translation Models For Indian Languages
Factored translation models are an extension to phrase based statistical translation models which integrate additional annotation at word level. Here we present a study of statistical models and approaches to translate Hindi to English. Experiments were also conducted on alignment models using various word groupings and using GIZA++ to predict their English translations and fertility. TAJ A new...
متن کاملGraph-Based Word Alignment for Clinical Language Evaluation
Among the more recent applications for natural language processing algorithms has been the analysis of spoken language data for diagnostic and remedial purposes, fueled by the demand for simple, objective, and unobtrusive screening tools for neurological disorders such as dementia. The automated analysis of narrative retellings in particular shows potential as a component of such a screening to...
متن کامل